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By a mixture density is meant a density of the form 7rp(-) — 
J i"e(-) X IJ-{d6), where {'Ke)eee is a family of probability densities and 
yU is a probability measure on Q. We consider the problem of identify- 
ing the unknown part of this model, the mixing distribution /i, from 
a finite sample of independent observations from tt^ . Assuming that 
the mixing distribution has a density function, we wish to estimate 
this density within appropriate function classes. A general approach 
is proposed and its scope of application is investigated in the case of 
discrete distributions. Mixtures of power series distributions are more 
specifically studied. Standard methods for density estimation, such 
as kernel estimators, are available in this context, and it has been 
shown that these methods are rate optimal or almost rate optimal in 
balls of various smoothness spaces. For instance, these results apply 
to mixtures of the Poisson distribution parameterized by its mean. 
Estimators based on orthogonal polynomial sequences have also been 
proposed and shown to achieve similar rates. The general approach of 
this paper extends and simplifies such results. For instance, it allows 
us to prove asymptotic minimax efficiency over certain smoothness 
classes of the above-mentioned polynomial estimator in the Poisson 
case. We also study discrete location mixtures, or discrete deconvo- 
lution, and mixtures of discrete uniform distributions. 



1. Introduction. Let {X,J^) be a measurable space and let {TT0)e£Q be a 
parametric family of densities on X with respect to a common measure C- 
The parameter 9 is assumed to range over a set G ]B(]R'^); here d>l and 
B(-) denotes the Borel sets. For any probability measure on (6,B(0)), the 
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mixture density vr^ is defined on X by 

7r^(x) = / 7T0{x)fi{de). 

Je 

Here the family (irg) is called the mixands and /i is the mixing distribution. 
If has finite support, vr^ is called a finite mixture (density). Estimation 
of such mixtures from an i.i.d. sequence (-'^j)i<i<n, distributed according to 
TT^, with the aim of recovering the unknown support points, their weights 
and maybe also their number, has a long history and we refer to the mono- 
graphs by McLachlan and Peel [18], Titterington, Smith and Makov [20] and 
Lindsay [15] for further reading. In the present paper we are interested in 
nonparametric estimation of the mixing distribution ^. We will assume that 
each such distribution under consideration has a density, called the mix- 
ing density and denoted by /, with respect to a known reference (Radon) 
measure v on (G,B(0)). 

The problem of estimating / for mixtures of discrete distributions {X is 
discrete) has been investigated, for instance, by Zhang [23] and, for Pois- 
son mixtures with v being Lebesgue measure, by Hengartner [12]; see also 
references in these two articles. The estimators examined by these authors 
are of two sorts. Zhang [23] used a kernel density estimator and adapted 
it to the mixture setting to estimate / pointwise. Hengartner [12] used a 
projection estimator based on orthogonal polynomials to obtain an estima- 
tor of / as an element of L^[a,6], < a < 6 < oo. Loh and Zhang [16] used 
the kernel estimator to derive estimators of / in the two cases / G L^[0,6] 
and / G -^^^[0, oo) with 1 < p < oo. The main results of these works are con- 
cerned with establishing rates of convergence of the estimators, depending 
on smoothness conditions assumed on the mixing density, and with estab- 
lishing bounds on the achievable minimax rate for mixing densities within 
balls defined by similar smoothness conditions. 

The results on both estimators were condensed and slightly generalized 
by Loh and Zhang [17], who also carried out a numerical study of their finite 
sample performance. A conclusion of their work is that, although both types 
of estimators achieve similar rates with similar smoothness conditions on the 
mixing density, projection estimators seem to behave much better for finite 
samples. As pointed out by Loh and Zhang [17], the rates being logarithmic, 
it is not surprising that identical rates do not imply similar performance for 
finite sample sizes. 

Another important point of the works cited above is that, although the 
rates of the estimators are derived over a wide range of smoothness classes, 
minimax rate optimality is proved only for particular instances. For example, 
Hengartner [12] obtained the rate of the projection estimator over Sobolev 
classes with arbitrary index of smoothness, but proved this rate to be min- 
imax optimal for integer indices only. Similar remarks apply to the results 
of Loh and Zhang [17], but for a family of ellipsoidal classes. 
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In this paper we develop a general framework for studying projection 
estimators, with the main focus on mixtures of discrete distributions. Let us 
denote by 11 the linear operator mapping a real function h on X to a, real 
function Hh on 0, defined by 

(1) n/i(6l) = vre/i = / h-Ke dQ for all 6 in 0, 

Jx 

whenever this integral is well defined. Here we use the classical functional 
analysis notation vr/i := / /idvr. Above we defined ttq and vr^ as densities on 
X with dominating measure but we will also use the same notation for 
the corresponding probability measures. Observe that, by Fubini's theorem, 
for all h such that vr^l/i] < oo, 

(2) 'K^h = J h{x)(^J^7r0{x)fi{de)^Cidx)=nUh. 

The mean ir^h may be estimated by a sample mean obtained using i.i.d. 
observations from tt^; see also [1], where this problem is addressed for h 
within a given class of functions. The basic idea of what we call the projection 
estimator is now to estimate vr^/i for a suitable finite collection of functions 
h and then to use (2) to obtain an estimate of /i. The precise definition is 
given in Definition 1. 

Our objective, classical in a nonparametric approach, is to find the asymp- 
totic behavior of the minimax risk 

_inf sup7r^"/(//,/i„), 

where C, I and 5„, respectively, denote a class of distributions, a loss function 
and a set of estimators defined on X"^ and taking values in a set compatible 
with the choice of I; tt^" is the distribution of n i.i.d. observations from vr^j. 
It turns out that there is a simple argument to lower-bound this quantity in 
a general mixture framework (Proposition 1). 

However, for exploiting this lower bound and studying the projection 
estimator, we will, as in the papers cited above, consider the case when fj, 
is defined by its density / = dfi/dv for a fixed v. In this setting we will 
likewise write vrj for vr^. Furthermore, the density / will be assumed to 
belong to the Hilbert space H = L^(z^) with scalar product (/, g)^ = J fgdu. 
Given an estimator f :X^ ^ EI of /, it is natural to consider a risk given 
by the mean squared error Ej||/ — here Ej denotes integration with 
respect to tt®^ and || • Hh is the norm on H. In nonparametric language this 
is a mean integrated squared error (MISE). We will notice that, in order to 
arrive at interesting results, it is sensible to define the class C above, which 
is now a class of densities in H, in accordance with the mixands. In the case 
of power series mixtures, this class is closely related to polynomials. Such 
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ideas were used already by Lindsay [14] in a parametric framework. Still in 
the context of power series mixtures, we will obtain results on minimax rate 
optimality of the projection estimator, using classical results on polynomial 
approximations on compact sets (Theorem 3). 

Having said that, we note that, quite generally, including Poisson mixands, 
the mixing density / may also be estimated using nonparametric maximum 
likelihood; Lindsay [15] is excellent reading on this approach. The optimiza- 
tion problem so obtained is an infinite-dimensional convex programming 
problem, and numerical routines for approximating the nonparametric MLE 
(NPMLE) can be constructed, at least in certain models. The problem with 
the NPMLE is rather on the theoretical side, van de Geer [21] proved a 
rate of convergence result in terms of Hellinger distance in a rather abstract 
setting, and it still remains to be determined what this result implies for the 
problems studied in the present paper. 

The paper is organized as follows. In Section 2 we give a general lower 
bound, in an abstract framework, on the obtainable error over certain classes 
of mixing distributions. This result is then specialized to the Hilbert setting 
outlined above, that is, we consider the MISE obtainable over smoothness 
classes of densities. In Section 3 we define the projection estimator and 
give a bias- variance decomposition of its loss. Section 4 focuses on mixtures 
of discrete distributions, containing a main theorem that provides a lower 
bound on the minimax MISE achievable over smoothness classes related to 
the definition of the projection estimator. An upper bound is also given and 
we discuss how these two bounds apply in a common setting. In Section 5 
we apply these results to power series mixtures and complete the results 
obtained by Hengartner [12] and Loh and Zhang [17]. Section 6 is devoted 
to translation mixtures, or discrete deconvolution, while Section 7 provides 
applications of our results to mixtures of discrete uniform distributions. Fi- 
nally, in Section 8 we give some examples in which the general methodology 
of the present paper may be valuable, but which we have not explored fur- 
ther. 

Before closing this section we give some additional notation that will be 
used in connection with the above-mentioned Hilbert space M. We write EI+ 
for the set of nonnegative functions in H, that is, EI+ = {/ G H:/ > 0}, 
and Hi for the set of functions in EI+ that integrate to unity, that is. 
Hi = {/ G H_|_ : uf = 1}. In other words, Hi is the set of probability den- 
sities on Q which are also in H. For any subset V of H, we write V'^ for 
the orthogonal complement of F in H, /_Ly if / E V'^ and Projy for the 
orthogonal projection on V. For two subsets W '^V, we shall write V QW 
for V n VF"*-. A subset V is called symmetric if V = —V, that is, if — / is in 
V whenever / is. 
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2. A general lower bound. In this section we first give a lower bound on 
the obtainable loss in a more general framework, before turning to the set- 
ting specified in Section 1. To that end, let {X,J^) and (O,^) be measurable 
spaces and let the function {9, A) i— > 'TTg{A) from O x ^ to [0, 1] be a proba- 
bility kernel. That is, it. (A) is measurable for all measurable subsets A<^X 
and 7r£)(-) is a probability measure on [X,T) for all 6 in 0. For instance, 
vr may be a regular version of a conditional probability V{X = ■\0 = ■) (we 
refer to [19], Section 3.4, for more details). 

We let M.{X ,J^) and M(G, Q) denote the sets of all signed finite measures 
on [X^T) and (G,^), respectively. For any set A, we write Ia for the indi- 
cator function of vl. As in (1), the linear operator 11 maps a real function h 
on A" to a real function on defined by Iih[9) = ngh. Considering 11 act- 
ing on bounded functions, its adjoint operator IT* operates from M.{Q,Q) to 
M.{X,J^) and is given by 

U*fi{A) = U*h1a = ^niA for all A£j^. 

When /i is a probability measure on 0, its image by 11* also is a 
probability measure; indeed, it is the mixture distribution obtained from 
the mixands (irg) and mixing distribution fi. 

For any real function h on X, we denote by Mf^ the multiplication op- 
erator which maps a real function / on to the real function defined 
by Mhf{x) = h{x)f{x) for all x & X. Considering this operator acting on 
bounded functions, its adjoint is an operator on M.{X,J^) given by 
M^// = fJ.Mh. In other words, M^fi is the measure with density h with re- 
spect to /i. 

Finally, we consider a subspace E of signed measures on 0, equipped with 
a semi-norm M. We assume that E is endowed with a cr-field which makes 
this semi-norm measurable. We write 5„ for the set of all £'-valued esti- 
mators based on n observations, that is, the set of all measurable functions 
from X^ to E. Finally, Ei is the set of all probability measures that belong 
to E. 

Proposition 1. Let h be a real nonnegative function on X, bounded 
by 1. Let C be a symmetric set included in the kernel of o H* and let fXQ 
be a probability measure on 0. Then for any number p>l, 

inf sup {U*fifAfP{fi - fi) 
A65„^g(^Q+c)n£;i 

(3) 

> sup{AAP(/x) ■.fi€C,fio±fi€ Si}(nVo/i)"- 

Remark. An obvious and interesting problem raised by the proposition 
is that of optimizing the right-hand side of (3) with respect to h. 
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Remark. One can allow the function h to depend on an index i as long 
as C is a subset of the kernel of each M^, o n*. Doing so, the second factor 
in the lower bound becomes Yli<i<:n^* f^ohi- 

Remark. The supremum in the lower bound is also that of A/'*' over 
C n {El — fio) n {fiQ — El). Since this set is symmetric, the supremum is the 
pth. power of its half diameter. 

Proof of Proposition 1. Write /i®" for the function on A'" mapping 
(xi, . . . , Xn) to nr=i h{xi). Let ^ be in the kernel of o n* such that /.to + /U 
is a probability measure. Then, since /i®" is bounded by 1, for all nonnegative 
functions g on ^Y", 

= (M;:n*(/.o + /i)r<7 

Now pick an estimator /i in 5„. Let /x be a signed measure in the kernel 
of o n* such that both ^± := /io ± /i are probability measures. Applying 
the above inequality twice, with g equal to g± ■.= M^{fi — fi±), we obtain 

(4) (n>+)®"ff+ + (n>_)«"5_ > (M,:n>o)®"(9+ + 5-). 

Furthermore, note that 

5++5->2^-PAAP(2^) = 2AAP(/.), 

so that the right-hand side of (4) is at least 2M^{fi){U* fioh)^ . The supremum 
in the left-hand side of (3) is at least half the left-hand side of (4), hence, 
at least J\f^{fi){Il* fioh)"^ . This corresponds to bounding the supremum risk 
from below by a two-point Bayes risk with uniform prior. We conclude the 
proof by optimizing over ^. □ 

We note that Proposition 1 holds for norms such as the norms or the 
total variation norm in a nondominated context. Our particular interest in 
this result, however, is when TTg{A) = Jj^ngdC, (what we call the dominated 
case) and when E is the set of all finite signed measures with a density with 
respect to z/ in H = L?'{i'). As this is the main topic of the remainder of 
the paper, we now restate Proposition 1 in this context as a separate result. 
Prom now on, Sn will denote the set of estimators in IHI from n observations, 
that is, the set of measurable functions from to H, where H is endowed 
with its Borel cr-field. 
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Proposition 2. Let /o be in Hi and let h be a real nonnegative function 
on X , bounded by 1. Let C* be a symmetric subset ofM such that, for C,-a.e. 
X ^ X , the mapping 6 i— > h{x)Trg{x) belongs to C*"*". Then 

mf sup Ef\\f-f\\^ 
/e'S„/e(/o+C*)nHi 

(5) 

> sup{||/||^ : / G r , /o ± / G IHIi}(vr;„/i)". 

Proof. Take E = {MJi^ : f emnL^{i^)} and define the norm A^(M| I/) = 
II/IIh on this space. Note that, for ah / in Hi, n*Mji/ = ttj. Thus, for aU 
/ G Hi, if p = 2 and fi = MjV, the expectation in the left-hand side of (1) 
equals that in the left-hand side of (5). 

Now put /io = MJ^^i' and let C = {M|i/ : / G C*} n M(e, G). From the as- 
sumptions on C* , it is clear that C is a symmetric set included in the kernel 
of oil*. Hence, in order to apply Proposition 1, it only remains to verify 
that 

(6) {M}u : / G (/o + n n Hi} = (^o + C)nEi, 

where Ei is the set of probability distributions in E, that is, Ei = {Mji': f G 
Hi}. By observing that C C {Mjiy: f G C*}, we get the inclusion "D" in (6). 
For showing the inverse inclusion, pick g £C* such that / := /o + 5 is in Hi. 
Since /o is in Hi as well, M*i/ G M{e,g). This proves (6). □ 

Re-examining the proof of Proposition 1 in this context shows that it 
uses arguments similar to those of the second part of the proof of Theorem 
3.1 in [12], but does not use Lemma 1 in [23], where a lower bound on 
supjPjIII/ — /||h > A} is derived, the supremum being over a given subset 
of Hi. 

3. The projection estimator. Assume that (Xj)i<j<„ are i.i.d. with den- 
sity TTf with / in Hi. We denote by Pn the empirical distribution defined 
by 

r 1 " 

Pnh= hdPn = -Y^ ^(^«) a\lh:X^R. 
" k=i 

Let TC denote the linear space containing all real functions h which satisfy 
7ro\h\ < oo for all 6. For introducing the projection estimator, it is convenient 
to consider n defined by (1) acting on 7i. The definition of the projection 
estimator depends on a given nondecreasing sequence {Vm)m>i of finite- 
dimensional linear subspaces of H. We put dm '■= dimV^ and define Vq := 
{0}. We assume without loss of generality that Vm is included in Il{7i); 
otherwise we let Vm n n('H) replace Vm- We furthermore assume that, for 
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any g in Um there exists a unique h in TC such that Ilh = g, and we write 
h = H-^^g. In other words, we assume that IT is one-to-one on n~^(lj^ Vm)- 
This is ensured, for instance, if 11 is one-to-one on Ti, which, as observed by 
Barbe ([1], Lemma 5.1) simply means that the mixands are complete in the 
sense that, if llh{0) = for all 9, then h = (our 11 and 7i correspond to 
Barbe's V and resp.). Moreover, he showed that for location and scale 
mixtures identifiability of the mixands in the sense vr^ = vr^/ if and only if 
= /i' implies that 11 is one-to-one ([1], Lemmas 5.2 and 5.3). 

Definition 1. Let fm,n be defined as the unique element in Vm. satis- 
fying 

(7) {fm,n,g)m = Pn^'^g for all g in Vm- 

This estimator is called the projection estimator of / of order m [from n 
observations and with respect to (Kn)]- 

From the assumptions above, the function which maps g in Vm to PrJl~^g 
is a linear functional and, thus, (7) completely defines fm,n by duality of the 
scalar product. The projection estimator relies on the following idea. First 
observe that in the Hilbert setting (2) reads 

(8) 7r//i=(/,n/i)H 

and holds for all h such that lih is in H. Hence, for all g in Vm, by the 
law of large numbers, Pjjl~^g tends to TTfI\~^g = {f,llll~^g)^ = {f,g)m as 
n — > oo, so that fm,n is approximately Proj^/^^ / for large n. Making m large 
as well, Projy^ / is roughly /, provided the closure of Um>i contains 
/. An important part of the development is thus to find a suitable rate at 
which to increase m with respect to n. 

In practice, the projection estimator can be expressed using an orthonor- 
mal sequence {(j)k)k>o in H such that (0fc)o<fc<dm-i is a basis of Vm for all 
m > 1. The expansion of the projection estimator in this basis then reads 

(9) fm,n = E iPnIl-^cPk)cPk- 

k=0 

For any random element g' in H such that vrjUf^Hg < oo, we define its vari- 
ance as vaTf{g) := ]Ej||(7 — Ejf^Hg. Under the i.i.d. assumption, the MISE of 
the projection estimator admits the following bias-variance decomposition. 

Proposition 3. For all f in Hi, the MISE of fm,n writes 

(10) E0m,n - fWl = 11/ - Projy„ /IIh + ^varK/m,i). 
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Proof. Pythagoras' theorem gives 

\\fm,n - fWm = \\fm,n - Proj^^ /||h + ||/ - Projy^ 

From (9) and (8), we have 

dm 1 dm 1 

k=0 k=0 

Inserting this equahty into the next to last display and taking expectations 
yields Ef\\fm,n - fWm = 11/ - Proj^^ f\\l^ + vaxf{fm,n)- Using (9) and the 
orthonormality of (0fc), we obtain 

dm 1 dm 1 

(11) var/(/m,n) = Y var/(f„n"Vfc) = - Y var/(Pin~ Vfc)- 

k=o A:=0 

The proof is complete. □ 

We finish this section by noting that in many cases the sequence (Vm) is 
defined as Vm = Span(n/io, . . . , n/id^_i) for a sequence {hk)k>o in T~(- such 
that (n/ifc)fc>o is a linearly independent sequence in H. This constructive 
definition of (Vm) automatically ensures that all the above assumptions are 
verified. Observe, however, that the projection estimator only depends on the 
sequence (Vm), whence diff'erent choices of (hk) are possible. In particular, 
by the Gram-Schmidt procedure, we can construct an orthonormal sequence 
i^k) as 

fc 

4'k = Yl ^k,Mhi for ah A; > 0, 

e=o 

for some real coefficients {^k,£)k,£>o for which we set •= for all i > k> 
0. The sequence {(pk) may then replace (/ifc) for defining the same sequence 
(Vm), and in this context (9) becomes 

d,n — 1 k 

(12) fm,n= Y Y^kAPnhi)ct>k. 

k=o e=o 

4. Application to mixtures of discrete distributions. The basic assump- 
tion of this section is 

X = Z+ and ( is counting measure. 

The case of continuous X seems to require deep adaptations and is left for 
future work. In the present setting we write 1^ for the indicator function 
Ifc(x) = l{x = k) and take 

(13) ym:=Span(nifc,0< A;<m). 
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Notice that 111^ = Tr.{k). We are hence in the constructive framework of 
Section 3 with dimV^ = "i, provided that (7r.(A;))fc>o is a sequence of hn- 
early independent functions in H. In this section we thus make the following 
assumption. 



(nifc)fc>o is a sequence of linearly independent functions in Hn L^{iy)- 
(Al) 

Obviously, since (1^) is a linearly independent sequence in 7i, so is (111^), 
provided 11 is one-to-one. We recall that this holds whenever the mixands 
are complete (see Section 3). Assumption (Al) implies that the projection 
estimator fm,n is well defined and, as a linear combination of lilt's, belongs 
to L}{v) for all m and n. Hence, it is a good candidate for estimating a 
probability density function with respect to v. We elaborate further on this 
assumption in Section 4.3. 

The results of Sections 2 and 3 may be used for bounding the minimax 
MISE infjg^ supjg^^EjII/ — for particular smoothness classes C, which 
we now introduce. 

For any positive decreasing sequence u = {um)m>o, any positive number 
C and any nonnegative integer r, define 

(14) C(n, C, r) := {/ G M : 11/ - Proj^.^ < Cum for ah m>r}. 

Note that, for r > 1, the classes C{u,0,r) do not reduce to {0} but to Vr- 
Also note that one may assume uo = 1 without loss of generality, in which 
case, recalling the convention Vq = {0}, 

C{u,C,0) 

(15) 

= {/ G H : II/IIh <C,\\f- Proiy^ f\\^ < Cu.^ for all m > 1}. 

Usually we simply write C(u,C) for C(u,C, 0). This set can be interpreted 
as the ball of functions whose rate of approximation by projections on the 
spaces Vm is controlled by {um) within a radius C. Finally, observe that 
having WmUm = amounts to saying that C(u, C, r) is a subset of the closure 
of U™>i Vm in M. 

For any fixed /o in we define the following semi-norm on H: 

(16) ll/lloojo —2^-esssup^^, 

6»ee Jo\pj 

with the convention 0/0 = and s/0 = oo for s > 0. This semi- norm is not 
necessarily finite. Also introduce, for any subspace V of H, 

K^j.iv) := sup{||/|U,/o --f^v, II/IIh = 1}. 
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Finally, we define for any positive numbers K and C , any sequence u = {um) 
as above and any nonnegative integer r, 

Cf.iK, u, C, r) := {/ G C{u, C, r) : ||/|U,/o < K] 

(17) 

= C(n,C,r)n{/GEI:||/|U,/o<A'}; 
again, just as for C(n, C), we write Cff^{K,u,C) for Cf^^{K,u,C,0). 

4.1. A lower bound on the MISE under (Al). The following result is 
derived from Proposition 2 using the smoothness classes above. 

Theorem 1. Let /o he in Hi, u = {um)m>o o, positive decreasing se- 
quence, C a positive number, r a nonnegative integer and K a positive 
number such that K < 1. Then for any positive integer n, any estimator 
fn in Sn and any integer m>r, 

sup ^fWf-ffm 
,fe{fo+Cf^{K,u,c,r))nmi 

^^^^ K 

V -f^oojo V >^m+2 O Vm) J 

Remark. For the lower bound (18) to be nontrivial, © Kn) 

must be finite. Since Vm^2 Q Vm is finite-dimensional, this is true if || • ||oo,/o 
is a finite norm on Vm+2 © Kn- 

Remark. The lower bound (18) can be optimized over all m> r. In 
most cases Koo,/o(^m+2 ©Kn) behaves like i^oo,/o(^) ^^'^ thus increases as 
m gets large. Hence, the squared term in the lower bound decreases when 
m gets large while, in contrast, vrjgjO, . . . ,m — 1} increases to 1 as m tends 
to infinity. 

The proof of the theorem is prefaced by two lemmas. 
Lemma 1. Let /o be in IHI+. Then for all f in M, 
(19) ll/lloojo =sup 

gm I/O, 151 jH 

with the convention 0/0 = and s/0 = oo for s > 0. 

Proof. First assume that there is a Borel subset ^ of with viA) > 
and such that both /o = and |/| > on A. It then follows immediately 
that the left-hand side of (19) is infinite, and so is the right-hand side (take 
9 = fU). 
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Now assume that there is no such set A. Using the convention 0/0 = 0, 
we then have / = (///o)/o i^-a.e. Letting fiQ be the measure having density 
/o with respect to u, we find that the left-hand side of (19), i^-esssup |///o|, 
equals ^o-esssup|///o|. Furthermore, /xq is a c-finite measure. Indeed, since 
u is (7-finite, the Cauchy-Schwarz inequality shows that fJ-oiK) = (/o, lK)m < 
||/o||H(z^l_ft:)"^^^ < oo for any compact set K. Hence, the space L°°(/io) and 
the dual L^(/io)* are isometric (see [9], Theorem 4.14.6), implying that the 
left-hand side of (19) equals 

I r /r , I r ir s Im[(///o)5]I 

/io-esssup|///o| = sup \fio[{f/fo)g]\= sup , 

g:fio\g\=l g:fia\g\<co ^0|*?| 

again with the convention 0/0 = 0. It now remains to show that this display 
is equal to the right-hand side of (19). 

To do that, notice that, for any g in H, {f,g)m = ^{fg) = l-i-o[{f / fo)9] and 
(/o) IdDm = /^obl- Thus, the right-hand side of (19) is the supremum of the 
same ratio as in right-hand side of the last display, but over (7 in EI rather 
than over g in L^(/io). However, these suprema are, in fact, identical, which 
concludes the proof. To see the equality, first observe that since /xobl = 
(l9|)/o)e < lbl|e||/o||H for any in H (Cauchy-Schwarz), IHI is included in 
L^(/_fo). The inverse inclusion does not hold, but, by optimizing the sign 
of g in the two suprema, we may replace / by |/| in the numerators and 
restrict the suprema to nonnegative g's and then use the result that any 
nonnegative function g in L^{^q) can be approximated by an increasing 
sequence of functions in H [e.g., by ((7l|g|<A/)A/>o]- D 

Lemma 2. Adopt the assumptions of Theorem 1 and denote by C* the 
set CfQ{K,u,C,r) n V^. We then have the upper and lower bounds 

(20) sup{||/||h : / G r , /o ± / G Ml} < Cu„, 

and 

K 



(21) sup{||/||h : / G C^ /o ± / G Ml} > C7n„+i A 



KooJo{ym+2 Q Vrr 



Proof. We start with the upper bound (20). Pick / in C*. Since / is 
then in C{u,C,r), \\f — Projy^ /||h < Cum for m>r. However, because / is 
also in V^, Proj^^ / = 0, and, thus ||/||h < Cu^. 

We now turn to the lower bound. Let {(j)k)k>o be an orthonormal sequence 
in M such that Vm = Span((/)o, . . . , (j)m-i) for all m > 1 (see Section 3). Using 
the fact that X]fc>onifc = HI = 1, monotone convergence provides 

^(ni^,nu)H = (ni^,i)e = z^ni^ for an^>o. 

k>0 
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The right-hand side of this equation is finite by (Al). Since (pi is a hnear 
combination of (nis)o<s</5 we obtain 

(22) ^|(0^,nifc)H| <oo foran^>0. 

A:>0 

We shall now prove (21) by constructing a function / in C* satisfying 
/o =b / G Hi and whose norm equals the right-hand side of (21). To that end, 
note that, by (22), we can find two numbers a and (3 such that 

(23) a5](</)„,nifc)H + /3 5^ (';^m+i,nifc)e — 

k>0 k>0 
and, putting / := acpm + f^fpm+i, 

(24) \\f\\M = {a' + (3^)'/^ = Cum+i/\ ^ 



KooJo{Vm+2eVm) 

To finish the proof, we need to show that f C* and /o ± / G Hi. To 
start with we note that / lies in Vm+2 and that / _L Vm- Therefore, ||/ — 
Projy^ /lie = for aUp > m + 2. Moreover, ||/-Projy„^_^^ /||h = |/3| < Cum+i 
and, since (un) is decreasing, ||/ — Projy^ /||h = (a^ + Z?^)"'^/^ < Cup for allp = 
r, . . . ,m. All this implies that / lies in C{u, C, r). Using (24), we also see that 
||/||oo,/o < ll/llHi^oo,/oC^m+2 Vm) < K, SO that / belongs to C{K,u,C,r). 
Thus, feC\ 

Finally, as a finite linear combination of L^{i^) functions, / is in L^{i'). 
Hence, dominated convergence and (23) yield vf = Z]fc>o(/' ^Ia-Oh = 0. By 
Lemma 1, we also find that, for all g in H+, 

(/o + /,ff)H>(/o,5)H(l-||/||oo,/o)>0, 

where we have used K <1. Taking g = {fo + /)- := — (/o + /) V yields 
-||(/o + /)_||h > 0, whence (/o + /)- = and /o + / G H+. Together with 
uf = 0, this shows that /o + / G Hi . The same arguments hold true for fo — f 
and the proof is complete. □ 



Proof of Theorem 1. Take C* as in Lemma 2 and define h:X^ 
{0, 1} by h{x) := 1(0 < x < m). Then any mapping 9 i— > h{x)'Ke{x) is either 
identically zero (if x > m) or equal to T^ei^) = Hla; (when x < m). Since 
such a Hlx trivially lies in Vm, it is orthogonal to C* . Thus, the conditions 
of Proposition 2 are met. Proposition 2, Lemma 2 and the trivial observation 
C* C Cff^{K,u,C,r) now prove the theorem. □ 
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4.2. An upper bound on the MISE under (Al). We shall now derive an 
upper bound on the MISE in the same context as above, by bounding the 
MISE of the projection estimator. The bias in Proposition 3 is trivially 
bounded within the smoothness classes defined above, so what remains to 
do is to bound the variance term uniformly over the same classes. 

In the following we denote by Rm the mx m upper-left submatrix of the 
infinite array [(111^, ni;)e]fe,z>o- Under (Al), is a symmetric positive 
definite matrix for all m > 1 . For / in IH+ , we denote by A j the m x m 
diagonal matrix having entries vrjlfc = (/, nifc)e on its diagonal. 

Theorem 2. Let foo be in EI+, u = {um)m>o o, positive decreasing se- 
quence, K and C positive numbers and r a nonnegative integer. Then for 
any positive integer n and any integer m>r, 

(25) sup EfWL^n - /nil < {Cumf + -tv{R-^Af^,m). 

f&Cf^{K,u,c,r)nmi n 

Remark. The upper bound (25) can be optimized over all m>r. As 
expected, the bias term decreases and the variance bound increases as m 
grows. 

Proof of Theorem 2. Pick a probability density / in Cf^{K,u,C,r) 
and depart from Proposition 3, noting that the squared bias term is bounded 
by {Cum)^ ■ Regarding the variance term, it is sufficient to consider n = 1. Let 
fm denote the column vector of coordinates of fm,i in the basis {Ii\k)Q<k<m 
of Vra- By Definition 1 and the definition of Rm, fm = where 
j^(m) jg ^]^g column vector function with entries 1^, < A; < m. Then 

var/(/m,i) =E/||/m,i||H - ||E//m,i||e 

< ¥.fi^Rjm - 

= E/((Pil(-))^/2-i(Pil(-))) 

= tr(ii-V/l(™)l("')^). 

The proof is concluded by observing that, as a positive definite symmet- 
ric matrix, has positive entries on its diagonal and by noting that 
^^iMiMT ^ ^^^^^ < ll/lloo,/^ Ay^,„,. □ 

Remark. Our objective here is only to provide an upper bound that is 
uniform over a given class of densities. For power series mixtures (Section 5) 
and mixtures of uniforms (Section 7), the bound on the variance varj(/m,i) 
will be made more explicit by using orthogonal sequences. These bounds will 
then be derived directly from (12). However, they are closely related to the 
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upper bound derived above. Indeed, let denote the matrix {^k/)o<k,e<m, 
where ^k,e is as in Section 3. Observing that {4'k,4>e)ii = {^mRm^m)k,e for 
all < k,i < m, we obtain R^^ = This relates (25) to orthonormal 

sequence techniques. 

4.3. Existence of smooth densities. Theorems 1 and 2 provide lower and 
upper bounds, respectively, on the MISE. The classes over which these 
bounds apply are different in structure though; the class in Theorem 1 is a 
ball centered at /o, while that in Theorem 2 is centered at 0. Therefore, the 
two bounds are not immediately comparable. The purpose of the following 
result is to show that under some conditions the former class is included in 
the latter one, thus implying that the lower bound is indeed smaller than 
the upper bound. 

Proposition 4. Let f^o be in u = {um)m>Q o- positive decreasing 
sequence and r a nonnegative integer. Assume that we have a density /o in 
Hi and a nonnegative Co such that /o belongs to C{u,Co,r) . 

Then for any positive K and K' satisfying K' /{I + K) > ||/o||oo,/oo ^''^^ 
any nonnegative C and C satisfying C — C > Cq, the inclusion 

(26) /o + Cf, {K, u,C,r)C Cf^ {K' ,u,C\ r) 

holds. 

Proof. This follows from the inclusion C(u,Co,r)+C(ti,C,r) ^C(u, Co + 
C,r) and the inequality ||/o + /lUjoo < (1 + ||/||oo,/o)ll/olloo,/oo- ^ 

In the case where the inclusion (26) holds, the lower and upper bounds 
of Theorems 1 and 2, respectively, apply in a common setting. Hence, it 
is important to be able, given a smoothness class, to find /o satisfying the 
assumptions of Proposition 4. Under (Al), given any sequence u = (um), 
it is always possible to find a nonnegative number Co such that the class 
C(u,Co,r) contains a probability density /o for all nonnegative r. Take /o = 
nio/i^nio; we then trivially have /o G Hi and /o G C(n, Co,r) for all Co > 
if r > 0, or for all Co > ||/o||bi otherwise. This choice will indeed be made 
in the case of a power series mixture in Section 5. In general, this /o does 
not guarantee the norm || • ||oo,/o *o finite on the sets {Vm)m>o, however, 
which is crucial for the lower bound (see the remark following Theorem 2). 
In the rest of this section we provide a general construction of /o which 
satisfies this constraint. 

Define 

:= J ^ ttkUlk G H : Qfc > for ah A; > I . 
lfc>0 J 
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By J2k>o'^k^^k £ H, we mean that J2k=o'^k^^k converges in H as n tends 
to infinity. Note that the series having nonnegative terms, by the monotone 
convergence theorem, this is equivalent to saying that 



/ ^oik^^k dv <oo. 



Of course, H.,, is contained in EI+, and for any function / = X]fc>o'^fcnifc G 
EI,,,, we have ||nifc||oo,/ < for all k\ consequently, || • ||oo,/ is a finite norm 
on every V^- We now show the existence of a "smooth probability density" 
/o in H,,, given any smoothness sequence u = (um)- 

Proposition 5. Assume (Al). T/ienH* and Hi have a nonempty inter- 
section. Moreover, for any positive decreasing sequence (um), the following 
holds: 

(i) For any positive Cq, there are elements in nHi which also belong 
to C{u,C(),l) and, hence, to C{u,Co,r) for any positive integer r. 

(ii) There exists a positive constant Cq such that there are elements in 
nHi which also belong to C{u,Cq). 

Proof. The linear independence part of (Al) implies z^Hlfc ^ for 
all k. For any positive sequence {at), a simple sufficient condition to have 
X^fcOfclllfc in H is absolute convergence, that is, '^fcl|nifc||H < co. More- 
over, by the monotone convergence theorem, 

fc>0 fc>0 

Hence, we may pick (a^) with > for all k and such that X^fe'^fc^lfc is 
both in H and in L^i^v). It is then also in Hi by normalizing appropriately. 
Hence, the first part of the proposition. 

For any f = J2k ^k^lk G H*, since ^^=0^ "fclllfe G and since Projy^ / 
minimizes ||/ — 5||bi over g G Vm, we have 



ll/-Projy^/|lH< 



k>m 



< X afc||nifc||H forallm>0. 

k>'m 



Hence, for having / in C(n, C, r) H H,,, n Hi, it is sufficient that (ofc) satisfies 
(27) X OfcZ^nifc = 1 and ^ afc||nifc||H < for all m > r. 

fc>0 k>m 

The second constraint simply says that the a^'s cannot be too large for 
k>r. If r > 1, the first constraint is then met by adapting the values of 
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for A; = 0, . . . , r — 1. If r = 0, then C must be taken large enough for both 
constraints to be compatible. We now formalize these ideas. 

Let (fm)m>o be a positive decreasing sequence such that Vm < Um for all 
m > and \\m.Vm = 0. Define a sequence {(5^) by 

■■= (vk - ffc+i)(||nifc||H V uUUy^ for all A; > 0. 

Then, by construction, (/J^.) is a positive sequence and, for all m > 0, both 
Efc>m/3fcl|nifc||H and J2k>mPk'^'^^k are less than Um- Now pick a posi- 
tive number C. Take = \Pk for all k > 0, where < A < C and A < 
(X]fc>o i^fc^nifc)"^. Then the second part of (27) holds with r = 1 and we 
may choose oq > for insuring the first part of (27). It follows that /o := 
EfcafcHlfc GEI*nIHinC(u,C,l). This proves (i). 

For the case r = 0, define Co := {J2k>o f^k^^^k)"^ ] this a finite positive 
number by the definition of {(3k)- Putting Ofc = Co/3fc for all A; > 0, (27) holds 
for C > Co and r = 0. This proves (ii). □ 

4.4. Minimax optimality. By optimizing the bounds (18) and (25) over 
m > r in a common setting (as detailed in the previous section), we obtain 
lower and upper bounds on the minimax MISE over classes Cf^{K,u,C,r) 
under the simple assumption (Al). Depending on how these bounds com- 
pare, we may obtain the minimax rate and possibly the asymptotic constant 
of the MISE achievable over such a class. However, this is not guaranteed. 
A crucial step for the lower bound is the computation of K^ojf^, which will 
be possible only for particular smoothness classes. Concerning the upper 
bound, we will need to find a tractable bound on the variance, and this will 
only be possible in cases where orthonormal sequences are easily obtained. 

In Section 5 these steps will be carried out for power series mixtures, re- 
sulting in minimax rates over smoothness classes as defined above. However, 
we will also give examples of mixands with different characteristics. In the 
setting of translation mixtures or deconvolution, treated in Section 6, an 
upper bound applies uniformly over all / in Hi. We will then derive a better 
adapted lower bound of the same rate. In the setting of mixtures of discrete 
uniform distributions examined in Section 7, Hl^ is not in -L^(z^) for the 
most natural choice of ly. We will then choose (Vm) different from (13) and 
adapt the proof of Theorem 1 to this choice. Finally, in Section 8 we give 
situations in which how the lower and upper bounds compare is an open 
question. 

5. Power series mixtures. Let (afe)fc>o be a sequence of positive numbers 
with oq = 1, and let i?, < i? < oo, be the radius of convergence of the power 
series 

k>0 
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Obviously, Z{0) = 1 and Z is an increasing function on [0,R). Put Z{t) : = 
1/Z{t). For all 9 G [0,R), the discrete distribution ng is defined by 

(28) TTe{k)=mkie) = ake''Z{e) forall/c>0. 

In particular, the Poisson and negative binomial distributions are obtained 
using, respectively, = l/Zc! and = It is without loss of gener- 

ality to assume oq = 1, since any constant multiplier of (a^) does not alter 

Recall that H = L'^{v), where is a Radon measure on 0; in the case of 
the above power series mixture, is a Borel subset of [0, R). Let us first give 
sufficient and necessary conditions on v for our previous results to apply, 
that is, for assumption (Al) to hold. These conditions are as follows. 

Proposition 6. For mixands given by (28), (Al) is equivalent to having 
both the following assertions: 

(i) jQ9''Z{d)iy{d6) is finite for all nonnegative integers k; 

(ii) v is not a finite sum of point masses. 

Proof. Condition (i) exactly says that 111^ is in L^(;/) for all k. Since 
Z is bounded by one, it also gives that /q 0'^^Z'^{6)v{d6) < oo, that is, Illfc 
is in L?'{i') for all k. Hence, (i) is necessary and it is sufficient for having a 
sequence in both L^{i') and Lp'{v). 

We now claim that the sequence {Jl\k)k>Q is linearly independent in H 
if and only if (ii) holds. First note that if (ii) does not hold, then EI is 
finite-dimensional and cannot contain an infinite sequence of linearly inde- 
pendent elements. To prove the converse implication, assume that (ii) holds, 
so that the support of v is infinite. Pick a nonnegative integer p and let 
(Afc)o<fc<p be scalars such that Y^Q<k<p^k^^k is the zero element of H, that 
is, YjO<k<p^kT^d{k) = for i/-a.e. 6 £ Q. Since ■K.{k) is continuous on G for 
all k, {0 £ Q ■J2o<k<p^k'^e{k) = 0} is a closed set (in the relative topology 
on 0). Consequently, it contains the support of u and, thus by (ii), p+1 dis- 
tinct points 6i £ Q, i = 0, ... ,p. As Z > 0, it follows that Yl,o<k<p ^kO-k^i = 
for i = 0, . . . which in turn implies Afc = for A: = 0, . . . ,p. This shows 
that (nifc)o<fc<p are linearly independent for all p > 0, which completes the 
proof. □ 

The objective of the remainder of this section is to carefully apply Theo- 
rem 1 to power series mixtures when v is Lebesgue measure on a compact 
interval, and to find upper bounds on the MISE for the projection estimator. 
This is organized as follows. We first provide computational expressions for 
the projection estimator in Section 5.1. We then examine the smoothness 
classes defined by (14), (15) and (17), and how these classes intersect Hi 
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(see Section 5.2). In this context and under a submultiphcative assumption 
on the sequence (afc), we find that the upper and lower bounds on the MISE 
have the same rate, the minimax rate. A closer look is made when R< oo 
and also for Poisson mixtures (for which R = oo). These results are stated 
in Section 5.3, where they are also compared to previous results found in 
similar settings. 

5.1. Computations based on orthonormal polynomials. In this section we 
shall elaborate on the use of orthonormal polynomials in connection with 
the projection estimator and power series mixands. These polynomials will 
serve two purposes: being building blocks for numerical computations of 
the projection estimator and being a mathematical vehicle for establishing 
bounds on its variance. 

The projection estimator may be computed using the techniques of Sec- 
tion 3. More precisely, since Vm = Span(nifc,0 <k<m) and Pn^i is the 
empirical frequency of ^ in the sample (^i)i<i<rn (12) translates into 

m— 1 k Y m—l n 

(29) fm,n =J2J2 '^kAPnU)(l)k = - J2 J2 

k=0 e=0 k=0 i=l 

recall that ^k,i ■= for i> k. In the case of power series mixtures, we may 
use orthogonal polynomial techniques for constructing the sequence {(j)k)- 
Let Vm be the set of polynomials of degree at most m (with the convention 
p_^ = {0}). In view of (28), 

(30) Vm = {pZ:peVm-i}. 

Define the measure v' on by dv' = Z'^ dv and let H' = l?{y'). Then for 
any two polynomials p and g, (pZ,gZ)e = (p, (?)h'- Hence, if [cfy. )fc>o is a 
sequence of orthonormal polynomials in H' with 

(31) 4{t) = jZQU\ 

1=0 

then the sequence {4>k)k>o defined by 

ut) = q'k'mit) = x: Q^zit) = y: ^Mi) 

1=0 1=0 "'^ 

is an orthonormal sequence in H such that {4>k)o<k<m. spans Vm. Thus, 
^k,i = {Qk I / 0-1)^(1' ^ k) in (29). This shows that fm,n is the same estimator 
as the one defined by Loh and Zhang ([17], equation (18)) with weight 
function w = l. However, it differs from the one studied by Hengartner [12], 
since the latter is a polynomial, and ours is in Vm- The coefficients ; may 
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be obtained using standard methods to compute orthogonal sequences of 
polynomials; a particular method is described in the Appendix. 

Let us also derive another estimator, denoted by fm,n and belonging to 
the space Vm '■= {pZ -.p £ Vm-i}- In analogy with Definition 1, this estimator 
is defined as the element of Vm satisfying 

(32) ifm,n,g)m = Pn^^^g for all g G Vm- 
Observe that {fm,n,g)M = {Z fm,n, Zg)u, so that (32) is equivalent to 

(33) iZfm,n,p)m = Pn^^^iZp) for all p £ Vm-l- 

Since PnIl~^{Z-) is a linear functional on Vm-i, this uniquely defines Zfm,n 
in Vm^i and thus fm,n- 

We see from (32) and (7) that fm,n = Projy^ fm,n- Therefore, by linearity, 

fm,n - IE//m,n = Projy^^^ (/m,„ - £//„,,„). Since projections do not increase 
the norm, taking the squared norm and expectation gives 

(34) var/(/m,n) < var/(/m,n) for all / G Hi. 

We will use this property below to bound the variance of fm,n- At the mo- 
ment let us note that this bound indicates that fm,n does not behave as 
well as fm,n, even though, for brevity, we leave aside the problem of the 
bias. Nevertheless, the estimator fm,n has the appealing property that it 
may be expressed by using a sequence {q'^)k>o of orthonormal polynomials 
in H = L'^{i') that does not depend on the sequence (a^) but only on u. To 
see this, let us write, as in (31), 

1=0 

Again, by convention, we extend the values of Q'j^ ; to the domain / > A: by 
zeros. An algorithm for computing Q'j^ ^ is given in the Appendix for = [a, h\ 

and certain choices of u. Let US now express fm^n terms of this secjuence. 

By (33), as n~'^{Z{t)t^) = U^^{TT.{l)/ai) = li/ai, we obtain 

fc Q'j^^ 

{Zfm,n,qk)M = 2_] -Pn'i-e- 

Since Zfm,n belongs to Vm-i, we conclude that the right-hand side of this 
display has the coefficients of the expansion of Zfm,n in the orthonormal 
basis (<?^). Thus, 

m— 1 k f)V ry m—\ n f^u 

(35) Un = ZY. E^iPnUK = - E E 

Below we will use this expression to bound the variance of fm,n- 
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5.2. Approximation classes. Recall the definitions (14), (15) and (17) 
made in Section 4. These smoothness classes are closely related to those 
used in previous works on power series mixtures, as will be shown in this 
section. The discussion will be devoted to the case 



It turns out that, for these particular sequences, the classes C(u", C) of Sec- 
tion 4 are equivalent to classes defined using weighted moduli of smoothness. 
This, in turn, will relate them to Sobolev and Holder classes, classes that 
were considered by Hengartner [12]. To make this precise, let || • ||p be the 
norm over [a,b], define the function cl){x) = \J{x — a){h — x) on this interval 
and let A^(/, x) be the symmetric difference of order r, that is, 



with the classical convention that A^(/,a;) is set to if x + — r/2)h is 
outside [a, 6] for i = or r. Then for any function / on [a, 6], the weighted 
modulus of smoothness is defined as 



The effect of the weight (j) here is to relax the regularity conditions on / at 
the endpoints a and h. Finally, for all positive numbers a and C, define the 
classes 



(37) C(a,C) := {/ G M: ||/||h < C, ,,,(/, t)2 < for all t > 0}. 



The following result shows that these classes are, in a certain sense, equiva- 
lent to C(u",C). 

Proposition 7. For any positive number a, there exist positive con- 
stants Ci and C2 such that 

(38) C(u",CiC) CC(a,C) CC(u",C2C) forallC>0. 

Before giving the proof of this proposition, we explain the point of this 
result and of defining the classes C{a,C). Recall that the standard modulus 
of smoothness 



Hence, HI is the usual 





(36) 



u;t{f,t)p:= sup ||a;;^(.)(/,.)|| 



i^r{f,t)p:= sup ||a;;(/,-)|| 



p 



0<h<t 
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provides definitions of semi-norms for Besov and Sobolev spaces (see, resp. 
equation (2.10.1) and Theorem 2.9.3 in [4]). In particular, the classes defined 
as in (37) but with the weighted modulus of smoothness (36) replaced by 
the standard one with the same parameters r = [a] + 1 and p = 2 are balls in 
the Besov space B^{L'^[a, b]). Now using Theorems 6.2.4 and 6.6.2 in [4] and 
the fact that ||/|| ^ ll/llvVj^i from [4], equation (6.6.5), we have that, for 
a constant Co > only depending on p and r, 

4{f,t)p < CocJr(/,t)p for < t < (2r)-i. 

Furthermore, bounding ujf{f,t)p by up to a multiplicative constant for 
t > (2r)~^ as in [4], equation (6.6.5) shows that C(q,C) contains Besov balls 
{L'^[a,b]) ^ C'qC} for a constant Cq, but is not contained in such balls. 
Using inequahties between Holder, Sobolev and Besov semi-norms, it also 
follows that C{a,C) contains balls of the Holder space C"[a,b] and of the 
Sobolev space W2 and, of course, converse inclusions are not to be found. 
In view of Proposition 7, since C{a,C) contains Besov, Holder and Sobolev 
balls as just described, so does C(u",C). 

These inclusions are helpful for comparing our results to those of Hengartner 
[12], where minimax rates are given for Sobolev balls with integer exponents 
and conjectured for Holder balls. In his paper, as well as the present one, 
rates for the projection estimator are obtained using properties which hold 
over the classes C(a, C) and, consequently, over smaller ones such as Sobolev 
and Holder balls. Minimax bounds, however, are obtained using different 
methods. Our approach takes advantage of the whole class over which the 
rate applies, whereas Hengartner [12] only used subclasses to derive minimax 
bounds. This "closer look" allows us to derive minimax bounds applying to 
C{a,C) for all a > 1, not only integers, and to obtain results on the asymp- 
totic constant when refining the class C{a,C) to C{u,C,r). 

Proof of Proposition 7. Write the equivalence relationships (38) as 

C(u-,.)xC(a,.). 
We start by relating C{a,C) to classes of the form 

C{u,C):= If eM: inf ||/ - p||h < Cn„ for all m > ol; 

recall that Vm is the set of polynomials of degree at most m. Theorem 8.7.3 
and equation (8.7.25) of [4] show that, for all a > and all r > a, there exist 
constants C( and C2 such that, for all C > and / G H, 

supu;,*(/,l/t)2t"<C =^ sup inf ||/ - p||Hm" < C^C, 

t>r m>irP^'Pm 

sup iiif ||/-p||Hm"<C^C =^ supa;,^(/,lA)2t"<a 

■m>rPSKm t>r 
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Here C( and C'2 may depend on a and r. Taking r = [a] + 1, observ- 
ing that (1 + m)° X {m — 1)° for ni > 2 and using ||/||h for bounding 
infpg-p^ j 11/ —pile and ujf{f, l/t)2 in cases not covered by the above imph- 
cations, we obtain C{a, •) x C{u", •). Thus, also ZC{a, •) x ZC(u", •), where 
ZC(a, C) := {Zf:f£ C{a, C)} and so on. 

Next we proceed to study ZC{u°',C). By (14) and (30), 

C(u,C) = (/gH: inf ||/ - pZ||h < Cu^ for ah m > ol. 

Since Z is positive and decreasing on [a,b], 

(39) Z(6)||/||H<||Z/||H<^(a)||/||H for ah/ in M. 

This shows that ||/ — pZ||h x ||^/ — p\\m, whence C(u", •) x ZC(u", •). Re- 
calling that ZC{a, •) x ZC{u", •), we thus see that in order to prove (38) it 
is sufficient to show ZC{a, •) x C(a, •). 

The remainder of the proof is thus devoted to showing that there are 
constants C[ and C2 such that / G C{a,C) implies Zf G C{a,C'iC) and 
Zf G C(a, C2C). Since b < R, Z is bounded away from zero and infinity on 
[a, b] and both Z and Z are thus infinitely continuously differentiable on 
this interval. Having made this observation, both of the desired implications 
follow from the claim that, for any [a] + 1 times continuously differentiable 
function g on [a, b], there exists c > such that 

(40) feC{a,C) =^ gfeC{a,cC). 

To prove this claim, pick an / in C{a,C) and let r := [a] + 1. Recalling 
that C(a,-) xC(u",-), we see that the union [j^,^QC{a,d) coincides with 
Uc'>o ^(^"' '^Oi and is, hence, increasing as a decreases. As the union can 
be written 1J^/>q C(a, c'C), there exists a positive c', depending only on a, 
such that C{a,C) C C{a — r + i,c'C) for alH = 1, . . . ,r. Since / is included in 
all these classes and r = [a] + 1, we find that u>f{f,t)2 = w^_^^j]_|_;^(/, i)2 < 

cfCt°'~^~^^ for these i, and also for i = with the usual convention (/, t)2 : = 
ll/lle. 

Now the equality (obtained by standard algebra) 

AUfg, ^) = E (^) AUf, x + {r- i)h/2)Al-\g, x - ih/2) 
i=o ^ ' 

and the bound \AU\g,x - ih/2)\ < Ib^*""*) ||Loo[„^b] (r/i)''"* for ah < i < r 
and X G [a,b] yield 

u;?(/5,t)2 < M,E ( ■ ) u^t{f,turtr-\ 
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where Mg :=maxo<j<r ||5'^'''^||L°°(a,6)- Since, as shown above, cof{f,t)2t^~^ < 
c'Cr, the claim (40) follows with' c = c'(l + rYMg. □ 

Let us now consider the smoothness classes Cfg{K,u,C,r) defined in (17). 
We take /o such that, for two positive constants ci and C2, 

(41) ci < fo{t) < C2 for ah t G [a, b]. 
Under this condition the norm (16) satisfies 

(42) lesssup|/(t)| < ||/||oo,/o < -esssup|/(t)|. 

C2 a<t<b Ci a<t<b 

Thus, as in (38), the classes defined by (17) are equivalent to classes defined 
by the weighted modulus of smoothness and a bound on the sup norm. 
Indeed, with C{K, a, C) := {/ G C{a, C) : esssup„<f<^ 1/(^)1 ^ 

(43) C/„(A7c2,u-,CiC) CC(K,a,C) CC/„(K/ci,u",C2C). 

Another important consequence of (42) is that for bounding K^jg{Vm) 
we may use the Nikolskii inequality (see, e.g., [4], Theorem 4.2.6). This 
inequality states that there is a universal positive constant C such that, for 
all nonnegative integers m, 

sup( sup \pit)\:p£Vm~i, f \pit)\^dt=l\<Cm. 

Also recall (30), that is, Vm = {pZ -.p G Vm.-i}- Combining these observations 
with (39) and the Nikolskii inequality yields, for all m > 1, 

(44) K^j,{Vm)<Ca,bm 

for a positive constant Ca^b depending only on (a^), a, b and C2- 

The following useful result says how the class C(a,C) intersects Hi. 

Lemma 3. Let a he a positive number. If C < l/Vb — a, then the in- 
tersection of C{a,C) with Hi is empty. Furthermore, the intersection of 
C{a, l/\^b — a) with Hi is the singleton set ;,]/(& — a)}. 

Proof. Pick / in Hi. Applying Jensen's inequality Eg{Y) > g{K{Y)) 
with g(t) = t^ , y = f and probability measure dt/ {b — a) on [a, h] gives 

/ 

^/b — a 

so that ||/||h ^ l/\/6 — a. Hence, the first part of the lemma. Now, using the 
strict convexity of the square function, equality in the above relation implies 



Ja b — a b — a 



ESTIMATION OF MIXING DENSITIES 



25 



that / is constant. Thus, to prove the second part of the lemma, we need to 
check that the uniform density l[a^^/{b — a) belongs to C{a, 1/Vb — a) for 

all a > 0. This is trivially true since , t)2 = for all t > and A; > 0. 

□ 

We conclude this section with a remark on the somewhat more general 
case when = [a, b] and ^{dt) = dt/w{t) for a weight function w, investigated 
by Loh and Zhang [17]. In this case the classes denoted by G{a,m, M,wo) 
in [17] are included in the classes C(u",C, m) as 

g{a,m,M,wo) C C(u", MiM, m + 1) for all a > 0, 

g{a',m,M,wo) D C(u'*, MgM, m + 1) for all a > a' > 0, 

for positive constants Mi and M2 depending on m, a and a' . Hence, our 
setting is very close to the one adopted by Loh and Zhang [17] in their Sec- 
tion 3, where they provide lower and upper bounds on the MISE over these 
classes. However, all their results in this section rely on special conditions, 
namely, their (19) and (20), which imply restrictions on the parameters a 
and m defining the classes g{a,m,M,W()) (see their Remark 3). In partic- 
ular, the rate optimality in these classes is only obtained for integer a (we 
refer to the closing comments of Section 3 in [17]). As remarked above, a 
similar restriction applies in [12], where minimax rates are proved in Sobolev 
classes with integer exponents. In contrast, the lower bound of Theorem 1 
will provide the minimax rate for all a > 1 in our classes and we will also 
obtain results on the asymptotic constant. 

5.3. Minimax MISE rates. The following result is concerned with the 
asymptotic properties of the projection estimator and lower bounds on the 
MISE over the approximation classes of Section 5.2. 

Theorem 3. Assume that v is Lebesgue measure on [a,b] with <a < 
b< R, and let A := 7 Vj^TT with 'y = {2 + a + b)/{b - a). Then the fol- 
lowing assertions hold true: 

(a) Let a and C be positive numbers, r be a nonnegative integer and {rrin) 
be a nondecreasing divergent integer sequence. If there exists a number Ai 
larger than A such that 

1 b^ 

(45) _^2m„ g^g^^^^ 

n 0<k<m„ Uk 

then 

sup %||/„.„,n - /lie < C'm-'^{1 + 0(1)). 

/eC{u",C,r)nHi 



26 



F. ROUEFF AND T. RYDEN 



(b) Let a>l,Cbea positive number, r be a positive integer and {m'^) 
be a nondecreasing divergent integer sequence. Put 

(46) Wn-=n ^ z^nifc for any positive integer n. 
If (wn) tends to zero, then 

(47) inf sup E;||/-/||i,>C2m'-2°(l + o(l)). 
/e-s„ /eC(u",c,r)nHi 

(c) Let a>l and C > 1/Vb — a. If there exist sequences (m„) and (m'„) 
satisfying the conditions of (a) and (b) and such that liminfn^oo n^n/n^'n > 
0, then the minimax MISE rate over C(a, C) nHi is m~^" and it is achieved 
by the projection estimator fm„,n- 

Before giving the proof in Section 5.4, we make the foUowing remarks and 
examine the particular cases of mixands with R< oo and Poisson mixtures: 

(i) The condition on C in (c) is necessary since otherwise the class is 
empty or reduces to one element; see Lemma 3. In contrast, under the as- 
sumptions of (c), a direct application of (a) and (b) provides the same min- 
imax rate over C(u", C, r) n Hi for a > 1, r > 1 and C > 0. 

(ii) The same lower and upper bounds apply to classes defined by adding 
a bound on the uniform norm. For instance, part (c) holds when replacing 
C(a, C) by C{K, a, C) for any K > 1 (for K <1, this class is empty or reduces 
to the uniform density as for a too small C). This can be verified easily by 
reading the proof. 

(iii) The o-terms in parts (a) and (b) can be made more precise. In part 
(a), m~^"(l -I- o(l)) can be replaced by (m„ -|- 1)~^" -|- k^""*", where ^ > 1 
and K> depend only on (a^), a and b and the inequality holds for m„ > r. 
In part (b), m''^" {1 + o{l)) can be replaced by (m'^ -|- 2)~^" exp(— k;'i(j„), 
where k' > depends only on (a^), a and b, and the inequality holds for n 
sufficiently large. 

(iv) The estimator /m„,n in (a) depends only on {rrin), which is fixed 
by (flfe), a and b through (45). Thus, it is universal in the sense that it 
does not depend on C or a, although, under the conditions of (c), it is rate 
optimal in C{a,C) n Hi for all a > 1 and all C > 1/Vb — a. However, an 
interesting problem, which is left open in the present work, would be to 
build an estimator which adapts to an unknown [a,b] C [0,R). 

(v) Clearly, one can always find two sequences (m^) and (m'„) satisfying 
the conditions of (a) and (b), respectively. In contrast, for obtaining (c), it 
remains to show that these sequences can be chosen equivalent. This requires 
further conditions on the asymptotics of (fl/t). 
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A condition addressing the issue of item (v) above is the fohowing: 

, , There exists a positive constant cq such that a^+i < c^aj^ai 
for ah nonnegative integers k and I. 

This condition holds in various important situations including Poisson 
and negative binomial mixands. In Proposition 8 below we show that it 
ensures that Theorem 3(c) applies. The condition says that (a^) is sub- 
multiplicative up to the constant cq. Mimicking the argument of the sub- 
additive lemma (see, e.g., [3], page 231, and Exercise 6.1.17, page 235), 
L = lim^^oo log a-n exists and is given by L = inf„>i (log cq -|- log an)- 
Thus, CQCn > e^" for all n > 1. Note that L = -co is possible. Since a„ = 
Q(^^n{L+e)-<^ and e^^ = 0{an) for all positive e, L is related to the radius of 
convergence through the relation Re^ = 1, that is, L = — oo if and only if 
R = oo and L = — logi? otherwise. In addition, for R < oo, we see that the 
series ^a^O^ is divergent at = R. 

A first simple application of this assumption is the following lemma: 

Lemma 4. Under (A2), we have, for any nonnegative integer m, 
^ umk<coamJ t^v{dt). 

k>m 

Proof. A direct application of (A2) with (28) shows that, for all k>m, 
niA;(t) < coami™nifc_„(t). Thus, by monotone convergence and the obser- 
vation Efc>onifc = 1, 

^ uUlk = vY. nifc < Coflm / t" E ^'^k{t)v{dt) = Coa.m f t^v{dt). 
k>m. k>m A;>0 D 

Proposition 8. Under (A2) there exist a sequence (m„)n>o satisfying 
the conditions of Theorem 3(a) and a number r] > 1 such that, by setting 
m'n := [rjmn], the sequence (m^)„>o satisfies the conditions of Theorem 3(b). 
Hence, Theorem 3(c) applies. 

We note that, with m'^ := [qrun], the asymptotic constants of parts (a) and 
(b) of Theorem 3 differ by a factor 77^". Thus, up to this factor, the projection 
estimator is minimax MISE efficient over classes, C(u",C, r)nE[i with a > 1 
and r > 1. How large 77 needs to be taken depends on the model through 
(ofc), a and b. The following result is a sharpened version of Proposition 8 in 
the case where R < 00, obtained by optimizing with respect to i]. We refer 
to the proof for details. The proof of Proposition 8 in the case where R = 00 
is postponed to Section 5.5. It provides an explicit, although more involved, 
construction of (m„) and r/ in a way making i]>2 necessary. 
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Corollary 1. Let v he Lebesgue measure on [a,b] with < a < b < R. 
Assume that R is finite and that (A2) holds. Let a>l and C > 1/(6 — a). 
Then the minimax MISE rate over C{a,C) H Hi is (logn)^^". This rate is 
achieved by the projection estimator fm^ with m„ := [rlogn] for any positive 
T less than l/log{A^(l V bR)}. Moreover, if a > 1, then for any positive C 
and any positive integer r, 

(48) limsup(lognr sup - < r'^C 

n^oo /6C(u",C,r)nHi 

and 

(49) liminf(logn)" inf sup E^||/ - /||e > (log(i?/6))"C. 

/G<s„/gc(u«,c,r)nei 

Proof. Put Tmax := l/log{A^(l V bR)} and consider (45). We will make 
use of the properties derived above from (A2). First we note that, since 
fflfc ^ Cq^c^^ = Cq^R~^ for all /c, we have 

-A?" max — <-A?™co max (6i?)'= = -A?™co(l V 6/2)""^ 

n o<k<m n o<k<m n 

Thus, the log of the left-hand side of this equation is at most mlog(Af (1 V 
6i?))-log?i + logco, so that for irin = [rlogn] with r log(Af (1 V6i?)) < 1, (45) 
holds. Combining this condition with the requirement Ai > A, we obtain the 
bound r < Tmax- Hence, for such (m^). Theorem 3(a) applies and gives (48). 

Now consider part (b) of Theorem 3. Lemma 4 shows that J2k>m ^ni^ = 
0{amb"'). Moreover, for any e > 0, it holds that < (e^/(l -e))'" = (i?(l - 
e))""^ for large m. Thus, with m'„ = [qnin] and (wn) defined as in (46), 
logtt;„ < log?! + 7]mnlog{b/{R — e)) up to an additive constant. Hence, we 
may choose r] > — l/(rlog(6/(i? — e))) > such that Wn 0. This achieves 
the proof of the minimax MISE rate by applying Theorem 3(c) with the 
chosen sequences (m^) and (m^). Theorem 3(b) gives a lower bound on the 
MISE asymptotically equivalent to C^(7/r logn)~^". Optimizing with respect 
to rjT under the above constraints and letting e tend to zero gives (49). □ 

We note that Loh and Zhang [17] proved the rate (logn)° to be minimax 
over different smoothness classes, but also under different assumptions on 
(ofc). Corollary 1 extends their results to other classes and mixands, but only 
for 6 < i? < oo. 

We now consider the Poisson case. We already know from Proposition 8 
that we can find a universal projection estimator whose rate is optimal in 
classes C{a,C), which complements the results of Hengartner [12] and Loh 
and Zhang [17]. This does not address asymptotic efficiency, however, which 
includes computations of asymptotic constants. It turns out that a direct 
computation of (m„) provides the following precise result for C(u",C, r). 
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Corollary 2. Let v be Lebesgue measure on [a,b] with < a < b < R 
and suppose that = l/kl (Poisson mixands, R = oo). Let a > 1 and C > 
l/\/h — a. Then the minimax MISE rate over C(a, C) H Hi is 
(logn/loglogre)"^". This rate is achieved by the projection estimator fm„ 
with rUn := [r log n/ log log n] for any positive r < 1. Moreover, if a> 1, then 
for any positive C and any positive integer r, the projection estimator fmn 
defined as above with t = 1 is asymptotically minimax efficient (including 
the constant) over C{u'^,C,r): 
l^msup( -^^^f^] sup %||/m„,n - /lie 
\loglogny /ec(u-,c,r)nHi 



n— >oo 



= liminf( ) mf sup E/||/-/||h 

n^oo ViOgiOgny /65n/6C(u«,C,r)nHi 

= C. 

Proof. Consider (45). By Stirling's formula, maxo<fc<m(^^/afc) = 

0{m'^c^) for a positive c. Since m„ < rlogn/loglogn for r in (0,1], a 
simple computation yields 

1 / m m / N ^/ -.M / NN lognlogloglogn 

log(m™"c'""/n) < (r- l)logn- (r + o(l)) & & & & ^ 

log log n 

Condition (45) follows for any Ai > A (indeed, Ai simply multiplies c). 

Now assume r = 1 and consider part (b) of Theorem 3. Use Lemma 4 
once again to see that X)fc>m ^ni^ = 0{amb"^)- In the present case amb"^ = 
0{m~"^c"^) for a positive c (not the same as above). For any a in (0, 1), it 
holds that m„ > o" log n/ log log n for large n. As usual, we set := [r/m„] 
for a positive number r] to be optimized later on and define {wn) as in (46). 
Thus, ii rja > 1, then up to an additive constant, 

^ogWn < log(n(r/m„)~''™'"c''™") < (1 - r?cr + o(l)) logn ^ -oo. 

The conclusions of the corollary now follow by applying the various parts 
of Theorem 3. In particular, the lower bound on the asymptotic MISE, 
normalized by the rate (logn/ loglogn)^", is obtained upon observing that 
we may choose cr and, hence, also r], arbitrarily close to 1. □ 

We recall that, in contrast to Corollary 2, Hengartner [12] did not consider 
constants and the exact rate was proved for Sobolev classes with entire 
exponents only. Likewise, Theorem 5 in [17] does not provide constants, 
and optimal rates are obtained only in cases similar to Hengartner [12] (cf. 
the paragraph ending Section 5.2 above). By determining the asymptotic 
constant, we also answer a question raised by Hengartner ([12], page 921, 
Remark 4). He suggested that the optimal r, in terms of the asymptotic 
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constant, may depend on the smoothness class under consideration. The 
above result shows that, at least not over a wide range of classes, it does not. 
Furthermore, Loh and Zhang [17] proposed an adaptive method to determine 
r in the formula m„ = r log n/ log log n for fixed n, but the behavior of this 
adaptive method was not proved to be better than for r constant. Corollary 2 
shows that such an adaptive procedure is not needed and that r can be taken 
equal to one. 

The final part of Corollary 2, saying that the projection estimator is 
asymptotically minimax efficient in the Poisson case, is a theoretical ar- 
gument corroborating the conclusions of the empirical study of Loh and 
Zhang [17]. Indeed, in a simulation study they compared the projection 
estimator to a kernel estimator and found that the former performed signif- 
icantly better for finite sample sizes. Both estimators achieve the optimal 
rate, but the kernel estimator does not exploit the polynomial structure of 
the classes C(u",C, r); this probably introduces a nonnegligible constant in 
its asymptotics. 



5.4. Proof of Theorem 3. Throughout the proof we denote by Ki some 
constants depending only on (cfc), a and b. 

We start by proving (a). Take / in C(u",C, r) nHi. In the bias-variance 
decomposition of Proposition 3, the first term is then bounded by C^(m + 
2 -J -2a £qj, g^Yl m>r. We now bound the second term in the right-hand side 
of (10). Using (34) and (35) and recalling that the q'jl are orthonormal in H, 
varj(/m,i) is bounded by 

IE/||/^,i|li,<^(6)'% 



(50) 



Moreover, 



0<l<k<m 



ai 



7Tfli= t f{e)aie'Z{9)de<aih'Z{a). 

J a 



Finally, using the bound on '}2i{Qki)'^ given by Lemma A.l, we obtain, for 
Aq > A (recall that A > 1 depends only on a and h) and m > r, 



E/ll/m,n - /III < C'{m + 1)-^" + max 

n 0<k<m Ok 

1/2 

Taking Aq in (A,A^ ), this proves (a) [see also remark (iii) following the 
statement of the theorem]. 
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Next we prove (b). Let /o = clllo = C7r.(0) = caoZ with c > such that 
/o is in Hi. Observing that the conditions given in Proposition 6 are sat- 
isfied for our choice of u, we can apply Theorem 1. It remains to verify 
that (18) implies (47) in this context. Since /o is in all Vm for positive m, 
/o + Cjp {K, u", C, r) C C(u", C, r) for any K > 0, say K = 1 and any positive 
r. Thus, (18) provides a lower bound on the left-hand side of (47). Next we 
lower bound the right-hand side of (18). Note that, for this choice of /o, (41) 
holds for ci and C2 depending only on (a^), a and b, so that, by (44) and 
the observation Kooj(,{Vm+2 Q Vm) < -f^oo,/o(^m+2), we find that, for all m, 

(51) . A Ci.„+i > A C{m + 2)-". 

^oojo {Vm+2 e Vm) m + 2 

Since we have assumed a > 1, the right-hand side of this expression equals 
C{m + 2)~" for large m. Regarding the second factor in the right-hand side 
of (18), we note that since /o is bounded on [a, b], there is a positive constant 
such that vrj(,/i < K^iyllh for all h>0. Thus, vrj^lO, . . . , — 1} = 1 — 
J2k>m' '^fo'^k > 1 — K^n'^Wn, SO that, under the assumption Wn 0, 

TTjJO, . . . ,m'„ - 1} > exp(nlog(l - i^sn^^tw^)) ~ exp(-K3tt;„). 

By applying Theorem 1 as explained above, the two last displayed equa- 
tions prove (47), with the more detailed lower bound claimed in remark (iii) 
following the statement of the theorem. 

Finally we show (c). The rate of the projection estimator follows from 
part (a) already proved, and the equivalence relationship (38) between the 
classes C(u", C) and C{a, C). We now turn to proving optimality of this rate 
for a > 1. Optimality over C{a,C) is established as in the proof of (b), but 
taking /o(i) = 1/(6 — a) for a < t < 6: we apply Theorem 1 and verify that, 
for this choice of /o, (18) implies the lower bound 

inf ^sup ¥.f\\f- f\\l>K^m-^^ 
feSn /ec(a,C)nHi 



for large enough n. Here are the details. Since /o is in — a) (as 

pointed out in Lemma 3) and u>f{f,t)p is a semi-norm in /, /q + C{a,6) C 
C{a,C) whenever 5 + Ij^/b^ < C. By Proposition 7, /o + C(u°,Ci5) C 
/o + C{a,5) for all (5 > 0, where Ci is as in (38). Adding a constraint on the 
supremum norm makes an even smaller class, whence fQ + Cfg{K, u", Ci5) C 

C{a, C) for any > 0, say K = 1, provided 5 is sufficiently small. Thus, the 
lower bound of Theorem 1 applies. Using the same arguments as in the proof 
of (b), we find that this lower bound behaves as K^{{m'n + 2)~^" A (m^ -|- 
2)~^), which has same rate as for all a > 1. This completes the proof 

of (c). 
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5.5. Proof of Proposition 8. We first note that, for R < oo, the proof is 
completely contained in the proof of Corollary 1. Thus, we here consider the 
case R = oo only. Then the conclusions drawn from the subadditive lemma 
[the paragraph following (A2)] are not helpful, as we can only conclude that 
flfc = 0{e^) for any e > 0; the latter is indeed implied by i? = oo alone. Thus, 
a more refined analysis is necessary, as in the proof of Corollary 2. 

First we note that, since b < R, it holds that a^b^ = 0{e^) for any e > 0. 
Thus, 

b^ akb^ 1 
max — = max — ^ = (J tt 

0<k<m Ok 0<k<m af. \mino<fc<m«fc 

and, consequently, the condition (45) on (mn) is implied by rather requiring 

(52) 1— ,^0. 

n mmo<fc<m„ % 

The reason why (52) is not used in Theorem 3 is that one would lose the con- 
stant derived for the Poisson case in Corollary 2. Moreover, using Lemma 4, 
we see that for (wn) to converge to zero with m'„ = it is sufficient 

that 

(53) no[^^„]6''"^"^0. 

It remains to check that there exist Ai > A, rj > and {nin) such that (52) 
and (53) hold true. We will do this by a constructive proof. 

To this end, take Ai > A arbitrary. The cornerstone in the construction is 
the following claim, to be proved below: we can find r/, a positive number 
Ci and K in (0, 1) such that, for all m > 0, 

\2(m+l) h-vm 

(54) 2 ^ Cil<''"- • 

mino<fc<m+i flfc 

Given that (54) holds, put 

rrin = max< m : < n 

I mmo<fc<m a| 

Since Ai > A > 1 and K <1, the sequence Af™/ mino<fc<m a| x if"™/^ is non- 
decreasing in m and tends to infinity. Thus, mn is finite for all n and (m^) 
is nondecreasing and tends to infinity. Moreover, A^™"/ mino<A:<m„ < 
and (52) follows. On the other hand, from the definition of m„ 

and (54), 



n < 



■v2(m„+l) ,—„m„ 
'^1 ^_(m„+l)/2 fj_^j^{m„-l)/2 
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for large n. Hence, (53) follows and the construction is complete. 

It now remains to prove the claim (54). To do this, let (rm)m>o be a 
sequence such that a,.^ = mino<fc<m Ofc- Applying (A2) yields 

Co min al = coal > a2rm for all m > 0. 

0<A:<m 

Now put s = 2rm+i and t = [rjm] and note that since 2rm+i < 2m, for any 
r] >2, it holds that t>s. In addition, fix p sufficiently large that CQflp < 1 
and cottptP < 1. The latter can be done since, as noted above, akb^ = O(e^) 
for any e > 0. Applying (A2) repeatedly, we easily obtain that there is a 
constant ci > such that 

at<cia,(coap)[(*-^)/p]. 

Using 2rm^i < 2m again, we find that [{t — s)/p] > {t — s)/p — 1 > {{r] — 
2)m — l}/p — 1. Recalling that CQUp < 1, together with the two last displays, 
this yields 

a[,m] < coCi f min al) {coap)^^^-^^^-'^^-' 

\0<A;<m+l / 

< Co mm Ul. i^— ^ 



for a positive Cq. Then with 

(coa,)Vp ' 

we see that (54) holds for a positive Ci. Finally, since {cQap)^^^b < 1, we can 
choose a large enough r] such that K <1. The proof is complete. 

6. Discrete deconvolution. In this section we take X = Q = "Z and let 

be counting measure. Let also p be a fixed and known probability mass 
function on Z and consider the mixands TTg{-) = p{- — 0). Another way to 
view this setup is the following. Take independent random variables e and 
U ^ both in Z, with probability mass functions p and /, respectively, and put 
X = U + £. Then the probability mass function of X is the convolution (/* 
p){-) = T^eVi' ~ ^)/(^) of / and p, which we can also write as J29'^ei')fid)- 
Our interest in recovering / from i.i.d. observations from X can thus be 
phrased as a deconvolution problem. Note that this setting includes the case 
of e being zero, that is, we estimate a discrete distribution. 

Observe that, for all integer k, Hlk =p{k — ■). Applying the general ap- 
proach of Section 4, we take Vq = {0} and, for all m > 1, Vm '■= Span(nifc : | A;| < 
m). This, of course, defines an increasing sequence of linear spaces. It remains 
to choose the measure u or, equivalently, the space EI = Lp'iy). A natural 
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choice is to let v be counting measure, that is, ]H = /2(Z). Then, since p IS 
square-summable, (Hn) is a sequence of subspaces of H. It is practical to 
define the projection estimator using Fourier series. Thus, let 

f (A) = p{k)e-'''^ for all A G (-vr, vr] 

fcez 

be the Fourier series with coefficients {p{k))kez- Then p* G 7r,7r] and, 
because p is a density, p* is continuous with positive norm. The Fourier 
series with coefficients {Illk)k^z simply reads p* {— X)e~^''^ . Because there 
necessarily is an interval on which p* is nonzero, {p*{—^)e~'^^^}k^z is lin- 
early independent and assumption (Al) then follows immediately. Hence, 
the projection estimator is well defined and the results of Section 4 apply. 

Let us derive the expression for the projection estimator fm,n through 
the Fourier series f^ ,^ with Fourier coefficients (/m,n(fc))fcez- Let P* be the 
Fourier series associated to the coefficients {Pnlk)k£Z, 

KW = Y.iPnWe-'''^ for ah A G (-vr,^]. 

Then applying Parseval's formula to (7) with g = Hlk, fmn is the unique 
element in Span(e~*'^'^p*(— A) : | A;| < m) which satisfies, for all k = — m, . . . , m, 

(55) ^ f_J:^^^{\)p*{\)e^^U\ = Pnlk = ^j\p:A>^)e'''^dX. 

Here we will treat the special case where the following condition holds: 

1 

This condition implies that p* may only vanish on a Lebesgue null set, and 
in a singular way (it cannot have a finite derivative where it vanishes). It 
is, of course, verified, for instance, if \p*\ is bounded away from zero, which 
includes the case of estimating a discrete distribution (e = 0). If (56) holds, 
there is a function in L^(— vr, vr] such that (55) holds for all A; G Z. Indeed, 
since Pn is a probability, | i^* | is bounded by one so that P* /p* is in (— vr, vr] 
and satisfies (55) for all k whenever (56) holds. This amounts to taking the 
definition of the projection estimator to its limit m = oo; hence, we put 
/(i,n := Pn/P* or, equivalents, 

fooAk) ■■= 7^ r ^T^e''^ dX for all k G Z. 
2vr J-TT p*{\) 

To compute the expectation of foo,n, first apply (8) and then Parseval's 
formula to obtain 

EfP:{X) = Y^nfUe-^''' 



(56) K,:= j-^dX<oo. 
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= ^(/,nU)He-'^'^ = r(A)/(A) for all A in (-vr.vr], 

fcGZ 

where /* is the Fourier series with coefficients {f{k))k)=z- The two last equa- 
tions give 

(57) EfU,n{k) = — p^lx) ^ ''^ dX = f{k) for ah k G Z, 

where, as \P*/p*\ < l/\p*\ is integrable, we applied Fubini's theorem. Hence, 
/oo,n is unbiased, as expected, since it corresponds to m = oo. Furthermore, 



(58) var;(/oo,„) = ^varj(/oo,i) < £^ l/o^.iCA)!' dA < 



27rn 



As shown by the following result, we are in a case where the minimax 
MISE rate is achieved by /oo,n over any reasonable subclass of Hi. This 
is a degenerate case compared to the general setting of Section 4, because 
smoothness conditions of the form (14) do not improve the minimax rate. 
To formulate the result, we define the line segment [/o,/i] between /o and 
/i as [fo,fi]:={{l-w)fo + wh:we[0,l]}. 

Theorem 4. Assume that (56) holds. Then the MISE of the projection 
estimator /oo,n has rate over Hi, and this rate is minimax over any 
class C included in Hi and containing a line segment [/o,/i] for distinct /q 
and fi in Hi. More precisely, for any /q and fi in Hi and any positive 
integer n, 

co{Ki + K2C,nr' < inf sup E^||/ - /||^ 

/GcSn/el/oJl] 

< sup E/||/oo,n-/||^<;^, 

where Ki and K2 are universal positive constants, cq := X]z6z(/i(0 ~ /o(0)^ 
and ci :=E«ezk/i(0 -7r/o(0l• 
REMARK. Observe that the lower bound may reduce to a positive con- 
stant only when the model is not identifiable, that is, if there exist /o and 
/i in Hi such that ci = 0. In this case (56) cannot be fulfilled. 

Proof of Theorem 4. Since /oo,n is unbiased, the upper bound simply 
is (58). 

The rate of the lower bound on the MISE generally holds for any 
regular parametric statistical model. Here we derive it via a Bayes risk 
lower bound. Consider the parametric model {7r(i_^)jQ^^j^ : w G (0, 1)}. Put 
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any continuously differ entiable prior density r on w £ (0,1) which is sym- 
metric about 1/2: r(l/2 + w) = r(l/2 — w). Let I{w) = 'E(^i_^^f^^^,^f^ x 

[(5^1og7r(i_^)j(,+^jj(X))2] and I{r) = r{wf /r{w) dw. Then, for any es- 
timator / from n observations, 

sup E;||/-/||i, 
/e[/o,/i] 



> 



(59) 



> 



1 . 2 

^{i-w)fo+wh 11/ - ((1 - w)fo + wfi)\\jjir{w) dw 



^E(i„^);„+^;J(/(/c) - {il-w)Mk)+wMk))f]r{w)dw 



fcgz " /o ^(w^)'^('«^) c^-"^ + 2:(r) 

Here the first inequality is the Bayes risk lower bound on the minimax risk 
and the second one is the van Trees inequality (a Bayesian Cramer-Rao 
bound); see, for example, [11]. We easily compute 

^ V (^/i(fe)-^/o(fe))' 

^ ^ ^^{^j,{k) + 7.f,{k))/2 + {w-l/2){^f,{k)-^f,{k)y 

with the convention 0/0 = 0. Using the symmetry of r, we obtain 

fi 

X{w)r{w) dw 



E 



k/i(^) - T^h{k)\r{w)dw 



{T^f,{k)+7:f,{k))/{2\7:f,{k) - 7:f,,{k)\) + {w - 1/2) 



< E -^/o(^)l / w ^r{w)dw, 

where we simply used a + 6 > |a — &| for any nonnegative a and h for the 
inequality. Hence, (59) gives the required lower bound, where Ki = T{r) 
and K2 = Jq w~^r{w) dw are fixed once a particular choice of r is made. 

□ 

Remark. There is a tradeoff between Ki and K2 when the prior density 
r is chosen for optimizing the lower bound for finite n. Asymptotically, the 
lower bound is equivalent to C()/{K2Cin), and it is easy to see that the 
infimum of possible values of K2 is 2 (let r tend to a point mass located at 
w = 1/2). Hence, 

lini inf inf sup nEf\\f - f\\^>^. 
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The right-hand side depends on /q and /i and should be compared to the 
asymptotic upper bound Kp /(2Ti). 

The statistical literature on deconvolution is vast, but it is primarily con- 
cerned with continuous random variables having densities with respect to 
Lebesgue measure, often on M. Some key references on achievable minimax 
rates of convergence over suitable smoothness classes in that setting are 
[2] and [6] for pointwise estimation of the mixing density, and [5, 7, 8, 22] 
for (weighted) loss. The difficulty of the estimation then depends on 
whether the characteristic function of e, that is, essentially our p*, vanishes 
algebraically or exponentially fast at infinity, these cases being referred to 
as ordinary smooth and supersmooth error densities, respectively. With an 
ordinary smooth error density, the optimal rate of convergence is algebraic 
in n, whereas it is algebraic in logn when the error density is supersmooth. 

In the discrete setting considered here, the notions of ordinary smooth 
and supersmooth error densities are void, since p* is defined on a compact 
interval, the unit circle. The MISE rate of Theorem 4 is also faster than 
what is obtained in the papers cited above; it only appears as a limit in 
the ordinary smooth case when the unknown density / has infinite smooth- 
ness. In the discrete case, the rate may not hold when (56) fails; some 
additional remarks on this issue are given in Section 8. 

7. Mixtures of uniform discrete distributions. We now take G = N := 
{1,2, . . .}, X = Z_|_ and let C, and u both be counting measure. Thus, H = 
P(0). Consider mixands given by the family of uniform discrete distributions 
on {0,1,...,6' - 1}; that is, lllk{0) = T^e{k) = for < A; < 6* - 1 and 
otherwise. Observe that, for all A; > 0, 111^.(0) = for large 9 so that 111^, is 
not in /^(0) and (Al) does not hold. Then letting the space Vm be spanned 
by (nifc)o<fc<m as in Section 4 would yield an estimator fm.^n that is a linear 
combination of nonintegrable functions and, hence, a poor estimator of the 
mixing density. It is possible to circumvent this problem by replacing by a 
distribution such that ((1 -|-^)~^)6i>o belongs to L^{v), but then the difficulty 
would lie in the definition of the smoothness classes (14). Indeed, in this case 
a different choice of Vm provides a much simpler definition of smoothness 
classes. For all /c > 0, we let hk = {k + l)(lfc — Ifc+i); which yields 

uhk = {k + i)(nifc - nifc+i) = ifc+i. 

Hence, {Ilhk)k>o is itself the orthonormal basis denoted by {(pk) in Section 3. 
It follows that 

Vm = Span(n/ifc, <k<m) = {fe /^(G) : f{9) = for all 9 > ?n}, 
the projection estimator is 

fm,n{k) = kPn{lk~i " lfc)l(^' < m) for all k > 1, 
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and the smoothness classes of Section 4 read 

(60) C{u,C,r) = lf ^ f{k)<C'^u'i for all m>r I. 

I k>m ) 

Since YSXf{hk) < (A; + l)^(7rj(A;) + 7r/(A; + 1)) for any / in Hi, Proposition 3, 
along with (11), gives, for all m, 

1 m— 1 

(61) Ey||/„,„ - fWl < ^ fi9) + - E + + + !))• 

e>m fc=0 

As in Section 6, this implies that the MISE rate is achievable as soon as 
TTf has finite second moment, that is, '}2,k>Qk'^T^ f{k) < oo. Moreover, it holds 
that TTf has finite second moment for any / in C{u,C,r) n Hi whenever 
u = (um) satisfies ^m?/'^Um < oo. Indeed, as a simple consequence of the 
Cauchy-Schwarz inequality, Trf{k) = 0{{k^^J2e>k fi^)V^^) = 0{k"^^'^Uk) 
for such /. The interesting cases are thus those when (um) decreases slowly, 
and Corollary 3 below provides such an example. 

Theorem 5. Let u = {um) he a positive sequence decreasing to zero, 
C a positive number and r a positive integer. Then for all sufficiently large 
m, 

(62) inf sup %||/-/||^>(%±i)7l-^Cn„+i)", 
and for any integer m>r, 

(63) sup Ef\\U,n-ffM<{Cu^mf + ^. 
/eC(u,C,r)nHi 

Proof. The upper bound (63) follows from (61) and the bound J2k=o (^"1" 
l)27r/(/t)<m2. 

The lower bound (62) is obtained along the same lines as the lower bound 
of Theorem 1. We apply Proposition 2 with h{k) = l{k < m), 

V = Span{h{k)TT.{k),k E X} 

= {/ G /^(G) : there exists A G M such that f{9) = X9~^ for all (9 > m} 

and C* := V-^ nC(n, ^C, r). Thus, the assumptions of Proposition 2 are im- 
mediately satisfied. To arrive at (62), we still need to find a suitable probabil- 
ity density /o such that /o + C* C C{u, C, r), and a function g in C* such that 
/o ± 5 are in Hi to bound the supremum in the lower bound of Proposition 2 
from below. 

In order to do this, let g in P{Q) be one of the two sequences satisfying 
the three equations 
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(i) g{9) = unless 9 = m, m + 1 or m + 2; 

m+2 m+2 

(ii) ^5^ = and ^^-V^)=0; 

6=m 8=m 
m+2 

(iii) Y.9'i(^) = ihCUm+lf. 

9=m 

Then H^Hh = \Cum+i- 

Let /o be such that /o(0) = for > 1 and /o(l) = 1 - E^=m/o(0). 

Then for m sufficiently large, E™=+^ /o(0) < V3(E"=+m5'(^))'/' = ^Cn^+i 
is less than one. Hence, /o belongs to Hi for large m. Using (60) and the 
result that (ufc) is nonincreasing, one readily checks that /o is in C(ii, ^C, r). 
It is then immediate that /o + C(u, ^C, r) C C(u, C, r), and thus also /o + C* C 
C(n,C,r). 

We now proceed to checking that g belongs to C* and that fo^g are 
in Hi. The latter follows from '^g{0) = and |/o| > l^]. The former is also 
true as g both belongs to C(u, ^C, r), which is checked as for /o, and is 
perpendicular to V , which follows from item (i) and the second part of item 
(ii) in its definition. 

Hence, Proposition 2 gives 

inf sup E^ll/ - fWl > MIt^H^, 1, . . . , m - 1}. 
/e5„ /eC(M,c,r)nHi 

We easily compute 

7r/o{0,l,...,m- 1} 

= 1 - /o(m + l)(m + 1)-^ - 2/o(m + 2)(m + 2)"^ 
>1 ^11 II 

> 1 \mm 

m 

and (62) follows from the two last inequalities. □ 

Corollary 3. Let a and C he two positive numbers and r a nonnegative 
integer. Then the minimax MISE rate over the class C((log~°(l + n)), C, r) 
is (logn)^^". This rate is achieved by the projection estimator fmn,n with 
rUn = [rn^] for any positive number r and any positive (3 less than 1/2. 
Moreover, this estimator is asymptotically MISE efficient up to a factor 
4//?2°. 

Proof. Put Um = (log(l + m))~". Use the lower bound (62) with m = n 
to obtain the asymptotics [Cunj'i)^ ■ Then use the upper bound (63), with 
m„ = \Trfi\ and /3 e (0, 1/2), to obtain 

sup E;||/^„,„ - /ll^ < + = C^+ 0(1). 

/6C(n,C,r)nHi ^^mn 
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Finally, {Cu^^ l{Cu\l2f 4/^2a_ ^he proof is complete. □ 

8. Open problems. In Sections 5 and 6 we have investigated some par- 
ticular cases for which (Al) is satisfied and, thus, both Theorem 1 and 
Theorem 2 apply. The upper bounds were, in fact, obtained directly with- 
out using Theorem 2, but are essentially similar (see the remark following 
the proof of Theorem 2). We have also seen in Section 7 that, in certain sit- 
uations when (Al) does not hold, this approach could be adapted. However, 
only in the case of power series mixtures and in the adaptation of Section 7 
could we compute explicit lower and upper bounds giving rise to identical 
rates. In the case of discrete deconvolution of Section 6, we even gave an 
alternative lower bound which gives the optimal rate in the degenerate case 
where the projection estimator fm^n can be defined for m = oo. In this sec- 
tion we outline a few open problems for which the framework of the present 
paper is potentially applicable, but in which we do not attempt to compute 
the bounds. 



8.1. Multivariate power series. Proposition 6 shows that Theorem 1 ap- 
plies for a large range of dominating measures u. Condition (i) is a simple 
reformulation of the requirement of having 7r.(/c) both in L^(z^) and in L^(i/) 
for all k, and easily generalizes to other mixands. Condition (ii) also general- 
izes when TT.{k) is related to a well-known sequence of linearly independent 
functions (here the polynomials). For instance, it trivially generalizes to 
multivariate power series mixands (see [13], Chapter 38). A slightly differ- 
ent setting concerns the bivariate Poisson distributions ([13], Chapter 37) 
given by, for all 6 = (^i, ^2, ^12) in Q := (0, oo)'^ and all nonnegative integers 
xi and X2, 



X1AX2 aXi-inX2—ini 

vr,(.„x2)=e-(^^+^^+^-) . .;; 

[Xi — l)\[X2 — t)h\ 



i=0 

In this case (i) is modified to /@ |(9|'=e"(^i+^2+ei2)j,((;5)) < ^ ^nd (ii) is un- 
changed. This is easily seen upon observing that {e^^^^^"^^^^(7r0(xi, X2)}xi,x2>o 
is a collection of trivariate polynomials in ^ = (^1,^21^12) that are linearly 
independent as the term 0^^62^ only appears in 7re(xi,X2). The rest of the 
proof above applies similarly. 

Although Theorems 1 and 2 apply, the rates they provide are not known 
explicitly. 



8.2. Power series mixing distributions with noncompact support. Let again 
TTg be the Poisson distribution with mean 6, and let Q = M+. Thus, we have 
power series mixands with = l/k\, but with Q being unbounded. Take ly 
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as Lebesgue measure, so that EI = L^(M+). Applying (12) [see also (35)], we 
find that, for all / in Hi, the variance term in Proposition 3 is bounded by 

(64) Ey||/™,i||^ = E/5^U)< E ^M' 

\fc=0 / 0<l<k<m 

which should be compared to (50). For Ai > 2 + \/T7/2, this bound, along 
with Lemma A. 2 in the bias-variance decomposition, shows that for any 
positive number C, any nonnegative integer r and any sequences {un) and 
(mn) satisfying A™"/um„ = 0(11^/^), 

lim sup ^ sup E/ 1 1 fu^^ ,n - / 1 1 H 

n^oo /eC(w,C,r)nHi 

< hmsupu-^JC^ni,^ + n-'KiXj"'") = C\ 

n— >oo 

For sequences u" the obtained root MISE rate is (logn)~" by choosing 
rUn = [rlogn] for small r. This is better than when b < 00 (see Corollary 2). 

Concerning lower bounds on the MISE, Loh and Zhang [17] give such 
a one in their Theorem 4 over particular classes related to ours, but their 
assumptions do not apply in the case considered here because they corre- 
spond to a weight function w = 1]r_^ with infinite norm. Hence, it is still 
to be found if the logarithmic rate of the projection estimator is optimal in 
this case. Theorem 1 applies and Proposition 5 shows that C{u,C,r) n Hi 
is nonempty for a positive r or for large C. The next problem, which we 
have not solved, rather consists in finding /o in this intersection such that 
Um = 0{1/ K^jg{Vm+2 Q Vm)), which is the key in our method for showing 
that the minimax MISE rate is (n^^). 

8.3. Discrete deconvolution with vanishing characteristic function. Let 
us return to the setting of Section 6. If condition (56) is not satisfied, our 
analysis must be refined. It is indeed possible that the optimal rate is slower 
in this case, and the behavior of p* at its zeros may then yield the optimal 
rate of convergence. 

To our knowledge, this problem has not been studied. A possible approach 
would be to mimic that of Section 5.1 by observing that the projection esti- 
mator can be easily defined using a sequence of orthonormal trigonometric 
polynomials in L?'{v') with ^'{dt) = \p*\'^{t)l(^-Tx,Tr]{t) dt, and to express fm,n 
using such a sequence. However, in contrast to power series mixands, the 
behavior of the projection estimator here should be driven by the measure 
v' , that is, by using precise assumptions on its zeros when (56) fails. 
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APPENDIX 

Recurrence relations for orthonormal polynomials. In this appendix we 
give some further results for the orthogonal polynomials (g^) in H and (g^ ) 
in H', introduced in Section 5.1. For any measure i^o on R, one can construct 
an orthogonal sequence of polynomials {rk)k>o with increasing degrees in 
LP'{uq) using a so-called three terms recurrence relation, 

rk+i{t) = {t- ak)rk{t) - [3krk-i{t) with r_i = and tq = 1, 

where {ak)k>o and {(3k)k>o are sequences depending on z^q. Moreover, putting 
/5o = Ikollii) ([10]> equations (1.13)) 

/ fc \ 1/2 

(65) Nk:=\\rk\\m=y[[Pjj for all A; > 0. 

Let the polynomials have coefficients rk{t) =J2i ^k,it'' and put 

(66) = Rk,i/Nk for ah k, I > 0. 

The latter coefficients are those of an orthonormal sequence corresponding 
to I and Q'j[ ^ for z/q equal to v' and v, respectively. The three terms 
recurrence relation can be written 

(67) Rk+i,i = Rk,i^i - OLkRk,i - l3kRk^i,i for all k, I > 0, with Rq^ = 1 

and the convention Rk^i = if / < or / > /c. Hence, by (65), (67) and (66), 
we see that knowledge of (afc)fc>o and (/3fc)fc>o provides a simple algorithm 
for computing the coefficients Q'j}\ recursively at a low computational cost. 
Let us now derive the coefficients and Pk for particular choices for i/q. 

A.l. Legendre polynomials. Let z^o be Lebesgue measure on [—1,1]. In 
this case Ofc = 0, /?o = 2 and (3k = 1/(4- l/A;^) for /c > 1 ([10], equation (2.1)). 

A. 2. Translated-scaled Legendre polynomials. Let z^o be Lebesgue mea- 
sure on an interval [a, b] . Denote fi:= {a + b)/2 and 5 := {b — a)/2. Replacing 
t by {u — ^) /5 in the three terms recurrence relation for Legendre polyno- 
mials, one obtains an orthogonal sequence for vq satisfying 

Multiplying by d^'^'^ and identifying {5^rk{{u — fi)/6))k with a new orthog- 
onal sequence, the previous equation gives the following coefficients in this 
case: = -/u, /?o = 26 and Pk = ^^/(4 - ^/k^) for A; > 1. 

The following result serves for bounding the variance of fm,n [see (34)] 
when z^ = z/Q. 
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Lemma A.l. Let Aq he a number larger than A := 7 + \/7^ + 1 with 
7 = (2 + a + h)/{h - a). Then T.i{QZf = Oi^f)- 

Proof. It follows from (67) that 

\\Rk+i\\<{i + \(yk\)\\Rk\\+Pk\\Rk^il 

where ||i?fc|| := iJ2iRk i)^^'^- Consequently, dividing by A^'^+i as given above, 
we obtain 



(68) iiQ^%ii < + J^\\QT-i\ 

Note that 



lim 1 + ^L2(1 + ^) ./^<1 foranfe>l, 

and that the positive solution of the quadratic equation 

, , 9 2(1 + , a + b , ^ b — a 

(69) - ^-—^x -1 = with = and 5 = 

2 2 

is A. The lemma follows. □ 

A. 3. Laguerre measure. Let z/q be the measure with density e~* with 
respect to Lebesgue measure over i E (0,oo). 

In this case = 2A: + 1, /?o = 1 and Pk = A:^ for /c > 1 ([10], equation (2.4)). 

A. 4. Squared Laguerre measure. Let finally vq be the measure with den- 
sity e~^* with respect to Lebesgue measure over t E (0, 00). 

This corresponds to v' in the case of Poisson mixands, = 1/kl and 
Z'^it) = e~^*. We have a result corresponding to Lemma A.l, but we now 
study W rather than M as in the case of compact support and Lebesgue mea- 
sure, as our interest lies in the coefficients ^k,i = Q'k\/'^i (^^^ Section 5.1). 

Lemma A. 2. Let Ai be a number larger than 2 + a/T? /2. Then J2i^ki = 

D(QrzM)2 = 0(Af). 



Proof. Dividing (67) by ai one obtains 
■,1^02 \ 1/2 



'^k+1,1 



ai 



(70) 



^^ rk + \ak\ fy^ Qkf Y^'^ j (3k fyjr^ Q'k-lA ^^'^ 



44 



F. ROUEFF AND T. RYDEN 



where := maxo<Kfc(fl«-i/flz)- By arguments similar to the ones used for 
the translated-scaled Legendre polynomials, Ok = k + 1/2, Po = 1/2 and (3k = 
k'^/A for k>l. Since = 1 / k\ , = k in (70). Moreover, 



Hence the result. □ 
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